Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh: Now able to handle "large output" from nrpe agent. #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gst
Copy link
Contributor

@gst gst commented May 20, 2015

No description provided.

@xorpaul
Copy link
Contributor

xorpaul commented Jun 23, 2015

I finally had time to test you patch. Unfortunately it does not work :(

On the second NRPE.read() if fails to unpack the data from the buffer, because the second buffer is too small:

[1435076655] ERROR: [Shinken] handle_read()
[1435076655] ERROR: [Shinken] _handle_read()
[1435076655] ERROR: [Shinken] handle_read()
[1435076655] ERROR: [Shinken] _handle_read()
[1435076656] ERROR: [Shinken] handle_read()
[1435076656] ERROR: [Shinken] _handle_read()
[1435076656] ERROR: [Shinken] len(buf): 1034
[1435076656] ERROR: [Shinken] got len(data): 1034
[1435076656] ERROR: [Shinken] using as datalen: 1024
[1435076656] ERROR: [Shinken] got message: OK - everything looks okay - v3.2|load1=0.000;3.000;5.000;0; load5=0.010;3.000;5.000;0; load15=0.050;3.000;5.000;0; %idle=97%;10;5;0 %user=2%;90;95;0 %system=1%;90;95;0 %iowait=0%;90;95;0 ram_free_percent=92%;10;5;0;100 ram_free=1848MB;201;100;0;2012 ram_buffers=205MB ram_cached=1209MB ram_slabr=178MB /_used_%=11%;90;95 /=2011MB;17885;18879;0;19873 uptime=1754049s;1800;;0 tasks_total=104 tasks_running=1 tasks_sleeping=103 tasks_stopped=0 tasks_zombie=0 ntp_offset=-0.000165s;40;100 oom_killer_lines=0 networkq_connections=0  mailq_count=0;500;10000 
OK - load average: 0.00, 0.01, 0.05</br>
OK - idle: 97% user: 2% system: 1% iowait: 0% interactive_mode: false</br>
OK - 92% RAM free: 1848MB total: 2012MB</br>
DISK OK - free space: / 16852 MB (89% inode=90%);</br>
OK - Uptime 20 days 7 hours 14 minutes 9 seconds</br>
OK: Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie</br>
OK - NTP offset: -0.000165 seconds against 111.22.333.444</br>
PROCS OK: 1 process with UID = 101 (ntp), command name 'ntpd'</b
[1435076656] ERROR: [Shinken] got package type 3
[1435076656] ERROR: [Shinken] got state: received
[1435076656] ERROR: [Shinken] got prev_state: creation
[1435076656] ERROR: [Shinken] rc: 0 , message: OK - everything looks okay - v3.2|load1=0.000;3.000;5.000;0; load5=0.010;3.000;5.000;0; load15=0.050;3.000;5.000;0; %idle=97%;10;5;0 %user=2%;90;95;0 %system=1%;90;95;0 %iowait=0%;90;95;0 ram_free_percent=92%;10;5;0;100 ram_free=1848MB;201;100;0;2012 ram_buffers=205MB ram_cached=1209MB ram_slabr=178MB /_used_%=11%;90;95 /=2011MB;17885;18879;0;19873 uptime=1754049s;1800;;0 tasks_total=104 tasks_running=1 tasks_sleeping=103 tasks_stopped=0 tasks_zombie=0 ntp_offset=-0.000165s;40;100 oom_killer_lines=0 networkq_connections=0  mailq_count=0;500;10000 
OK - load average: 0.00, 0.01, 0.05</br>
OK - idle: 97% user: 2% system: 1% iowait: 0% interactive_mode: false</br>
OK - 92% RAM free: 1848MB total: 2012MB</br>
DISK OK - free space: / 16852 MB (89% inode=90%);</br>
OK - Uptime 20 days 7 hours 14 minutes 9 seconds</br>
OK: Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie</br>
OK - NTP offset: -0.000165 seconds against 111.22.333.444</br>
PROCS OK: 1 process with UID = 101 (ntp), command name 'ntpd'</b
[1435076656] ERROR: [Shinken] handle_read()
[1435076656] ERROR: [Shinken] _handle_read()
[1435076656] ERROR: [Shinken] len(buf): 2
[1435076656] ERROR: [Shinken] got len(data): 2
[1435076656] ERROR: [Shinken] using as datalen: 2
[1435076656] ERROR: [Shinken] rc: 3 , message: Error : cannot unpack output ; datalen=2 : err=unpack requires a string argument of length 1034

If I have the time I'll write a small python script using the asyncore module to query the NRPE daemon. Maybe that will help.

@gst
Copy link
Contributor Author

gst commented Jun 24, 2015

What is the exact nrpe server you use ?

@gst
Copy link
Contributor Author

gst commented Jun 24, 2015

Note: because on my side i had not access to any such one (supporting the large buffer) and made this PR, with only the references i mentionned in the other thread, and without any real attempt actually.. So we should have expected such first results.. ;)

@xorpaul
Copy link
Contributor

xorpaul commented Jun 24, 2015

I'm using an up to date nrpe server with the opscode patch: https://web.archive.org/web/20100123200325/http://altinity.blogs.com/dotorg/nrpe_multiline.patch

I managed to create a python script using the asyncore module that works with such an nrpe server: https://github.com/xorpaul/shinken_nrpe/blob/master/check_nrpe.py

In this script I'm using asyncore.loop().
I don't know where the nrpe booster is looping over the remote socket. My theory is that it's currently (even with this code change) reading its 1034 byte just once from the remote socket.

We would just need to add a loop somewhere in the nrpe booster, but I don't know where this should be added. My best guess is somewhere in the Nrpe_poller class.

@gst
Copy link
Contributor Author

gst commented Jun 24, 2015

Damn, I checked your code and it should be the same as far as I can tell..

Yeah this mod-booster-nrpe code isn't necessarily all clear and simple.. it could be slightly enhanced regarding that (I imagine it could be smaller (and clearer) of at least up to 25% of its LOC).

I don't know where the nrpe booster is looping over the remote socket.

it's in the Nrpe_poller class, in the do_work method, there is the "main" loop of the "module", which calls asyncore.poll2() : https://github.com/shinken-monitoring/mod-booster-nrpe/blob/master/booster_nrpe/booster_nrpe.py#L512

and that's it.. (asyncore.poll2() called in a loop is equivalent to .. asyncore.loop() ;) )

@xorpaul
Copy link
Contributor

xorpaul commented Jun 24, 2015

Okay, then I'm also clueless why the second self.recv doesn't receive the rest of the output.

I just changed my script to use asyncore.poll2() and it still worked like you said: https://github.com/xorpaul/shinken_nrpe/compare/poll2?expand=1

I also tried replacing the asyncore.poll2() with asyncore.loop() in the Nrpe_poller class, but with the same result :/

Well, no point in loosing sleep over this. It's cosmetic enhancement anyway. As long as my status line + perfdata stays under 1 kByte everything is okay.

I will replace the NRPE server with my own golang clone https://github.com/xorpaul/gorpe in the long run anywat. It's using simple HTTP so it should be easy enough for me to provide a booster-gorpe module for shinken :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants